Definition 10.1.1: The Statistical Bond
Two variables $X$ and $Y$ are considered related if there is any change in the conditional distribution of $Y$, given $X = x$, as $x$ changes. Conversely, a state of "no relationship" is mathematically equivalent to the independence of $X$ and $Y$.
Variables $X$ and $Y$ are unrelated if and only if $f(y|x) = f(y)$ for all values of $x$. This implies that the joint relative frequency function can be factored as:
$$f(x, y) = f(x)f(y)$$
Therefore, testing for a relationship is fundamentally a test of Independence.
Mechanisms of Change
A relationship is identified by any shift in the conditional density function (as shown in Figure 10.1.1). This includes:
- Mean Shift: The expected value $E(Y|X)$ changes (the most common focus).
- Variance Shift: The spread or uncertainty of $Y$ depends on $X$ (Heteroscedasticity).
- Shape Change: The overall distribution transforms (e.g., from symmetric to skewed).
Establishing Causality through Design
A statistical relationship does not imply causality. To claim that $X$ causes $Y$, we must account for confounding variables through the Design of Experiments:
- Control Treatments: Provides a baseline for comparison.
- Placebo Effect: Mitigation of perceived improvement through inactive treatments.
- Blinding: Using blind experiments (recipients unaware) and double-blind experiments (recipients and researchers unaware) to eliminate bias.
- Blocking: As seen in Example 10.1.7, we use blocking variables ($W$, like soil fertility) to ensure the relationship between wheat type ($X$) and yield ($Y$) is not confounded by pre-existing conditions.